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Abstract 

In a recent paper [3], the Sequential Monte Carlo (SMC) sampler introduced in |12l 1191 [24] 
has been shown to be asymptotically stable in the dimension of the state space d at a cost that is 
only polynomial in d, when N the number of Monte Carlo samples, is fixed. More precisely, it has 
been established that the effective sample size (ESS) of the ensuing (approximate) sample and the 
Monte Carlo error of fixed dimensional marginals will converge as d grows, with a computational 
cost of 0(Nd 2 ). In the present work, further results on SMC methods in high dimensions are 
provided as d — > oo and with N fixed. We deduce an explicit bound on the Monte-Carlo error 
for estimates derived using the SMC sampler and the exact asymptotic relative L,2-error of the 
estimate of the normalizing constant. We also establish marginal propagation of chaos proper- 
ties of the algorithm. The accuracy in high-dimensions of some approximate SMC-based filtering 
schemes is also discussed. 

Key words: Sequential Monte Carlo, High Dimensions, Propagation of Chaos, Normalizing Con- 
stants, Filtering. 



1 Introduction 

High-dimensional probability distributions are increasingly of interest in a wide variety of applications. 
In particular, one is concerned with the estimation of expectations with respect to such distributions. 
Due to the high-dimensional nature of the probability laws, such integrations cannot typically be 
carried out analytically; thus practitioners will often resort to Monte Carlo methods. 
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An important Monte Carlo methodology is Sequential Monte Carlo samplers (see [T^JIH]). This 
is a technique designed to approximate a sequence of densities defined on a common state-space. The 
method works by simulating a collection of N > 1 weighted samples (termed particles) in parallel. 
These particles are propagated forward in time via Markov chain Monte Carlo (MCMC), using impor- 
tance sampling (IS) to correct, via the weights, for the discrepancy between target distributions and 
proposals. Due to the weight degeneracy problem (see e.g. |16j). resampling is adopted, sometimes 
performed when the ESS drops below some threshold. Resampling generates samples with replacement 
from the current collection of particles using the importance weights, resetting un-normalizcd weights 
to 1 for each sample. The ESS is a number between 1 and N and indicates, approximately, the number 
of useful samples. For SMC samplers one is typically interested in sampling a single target density 
on M. d , but due to some complexity, a collection of artificial densities are introduced, starting at some 
easy to sample distribution and creating a smooth path to the final target. 

Recently ( [2J HI I2B] ) it was shown that some IS methods will not stabilize, in an appropriate sense, 
as the dimension of target densities in a particular class grows, unless N grows exponentially fast with 
dimension d. In later work, [3] have established that the SMC sampler technique can be stabilized at 
a cost that is only polynomial in d. It was shown in [3] that ESS and the Monte Carlo error of fixed 
dimensional marginals stabilize as d grows, with a cost of 0(Nd 2 ). This corresponds to introducing d 
artificial densities between an initial distribution and the one of interest. The case of fixed d also has 
been analyzed recently [30] . 

The objective of this article is to provide a more complete understanding of SMC algorithms in high 
dimensions, complementing and building upon the results of [3]. A variety of results are presented, 
addressing some generic theoretical properties of the algorithms and some issues which arise from 
specific classes of application. 

1.1 Problems Addressed 

The first issue investigated is the increase in error of estimating fixed-dimensional marginals using 
SMC samplers relative to i.i.d. sampling. Considering the case when one resamplcs at the very final 
time-step we show that the L2-error increases only by a factor of 0{N~ 1 ) uniformly in d. Resampling 
at the very final time-step is often of importance in real applications; see e.g. [14] . 

The second issue we address is the estimation of ratios of normalizing constants approximated 
using SMC samplers. This is critical in many disciplines, including Bayesian or classical statistics, 
physics and rare events. In particular, for Bayesian model comparison, Bayes factors are associated 
to statistical models in high-dimensional spaces, and these Bayes factors need to be estimated by 
numerical techniques such as SMC. The normalizing constant in SMC methods has been well-studied: 
see [SI [TTj. Among the interesting results that have been proved in the literature is the unbiased 
property. However, to our knowledge, no results have been proved in the context of asymptotics in 
dimension d. In this article we provide an expression for fixed N of the relative L2-error of the SMC 
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estimate of a ratio of normalizing constants. The algorithm can include resampling, whereby the 
expression differs. The rate of convergence is 0(N" 1 ) 1 when the computational cost is O(NcP). The 
results also allow us compare between different sequences of densities used within the SMC method. 

The third issue we investigate is asymptotic independence properties of the particles when one 
resamples: propagation of chaos - see [11] Chapter 8]. This issue has practical implications which we 
discuss below. It is shown that, in between any two resampling times, any fixed dimensional marginal 
distribution of any fixed block of 1 < q < N particles among the N particles are asymptotically 
independent with the correct marginal. This result is established as d grows with TV fixed, whilst the 
classical results require N to grow. As in [3J [3D], this establishes that the ergodicity of the Markov 
kernels used in the algorithm can provide stability of the algorithm, even in high dimensions if the 
number of artificial densities is scaled appropriately with d. 

The final issue we address is the problem of filtering (see Section[5]for a description). Ultimately, we 
do not provide any analysis of a practical SMC algorithm which stabilizes as d increases. However, it 
is shown that when one inserts an SMC sampler in-between the arrival of each data-point and updates 
the entire collection of states, then the algorithm stabilizes as d grows at a cost which is 0(n 2 Nd 2 ), n 
being the time parameter. This is of limited practical use, as the computational storage costs increase 
with n. Motivated by the SMC sampler results, we consider some strategies which could be attempted 
to stabilize high-dimensional SMC filtering algorithms. In particular, we address two strategies which 
insert an SMC sampler in-between the arrival of each data-point. The first, which only updates the 
state at the current time-point, fails to stabilize as the dimension grows, unless the particles increase 
exponentially in the dimension. The second, which uses a marginal SMC approach (see e.g. |27j ) also 
exhibits the same properties. At present we are not aware of any online (i.e. one which has a fixed 
computational cost per time step) SMC algorithm which can provably stabilize with d for any model, 
unless N is exponential in d. It is remarked, as noted in [4], that there exist statistical models for 
which SMC methods can work quite well in high-dimensions. In relation to this, we then investigate 
the SMC simulation of a filter based upon Approximate Bayesian Computation (ABC) [20]. The ABC 
approximation induces bias which cannot be removed in practice. We show here that the simulation 
error stabilizes as the dimension grows, but we argue that the bias is likely to explode as d grows. This 
discussion is also relevent for the popular ensemble Kalman filter (EnKF) employed in high dimensional 
filtering problems in physical sciences (e.g. [T7]). 

The paper is structured as follows: In Section [2] we describe the SMC sampler algorithm together 
with our mathematical assumptions. In Scction[3Jour main results are given. In addition, we introduce 
a general annealing scheme, coupled with a consideration of stability results for data-point tempering 
[9] ; this latter study connects with our discussion in Section [5] on filtering. Section [4] considers the 
practical implications of our main results with numerical simulations. The filtering problem is ad- 
dressed in Section [3] We conclude with a summary in Section [SJ Most of the proofs of our results are 
given in the Appendix. 
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1.2 Notation 

Let (E,d>) be a measurable space and £P(E) the set of probability measures on it. For fi a a— finite 
measure on (E,^) and / a measurable function, we set /j,(f) = J E f(x)fi(dx). For /i £ &(E) and P 
a Markov kernel on (E,d>), we use the integration notation P(f)(x) = J E P(x,dy)f(y) and /iP(f) = 
J B fj,(dx)P(f)(x). In addition, P n (f){x) := / £ , n _ 1 P(x, dxi)P(a;i, daj 2 ) x ••• x P(/)(a; n _i). The total 
variation difference norm for /i, A £ &{E) is — A|| t „ := sup^g^ — A(A)|. The class of bounded 

(rcsp. continuous and bounded) measurable functions / : E — > K is written Bb(E) (resp. C(,(P)). For 
/ G Bt,(E), we write ||/||oo := su Pa;eR 1/(^)1- We will denote the L e -norm of random variables as 
\\X\\ e = E 1 / \X\ e with g > 1. For a given vector (xx, . . . , x p ) and 1 < q < s < p we denote by x q:s 
the sub- vector (x q , . . . , x s ). For a measure /i the A''- fold product is written fj,® N . For any collection of 
functions (fk)k>i, fk'-E—t- R, we write /i <8> • • • ® /fc : £* — ^ K for their tensor product. Throughout 
M is used to denote a constant whose meaning may change, depending upon the context; important 
dependencies are written as M(-). In addition, all of our results hold on probability space (H, J^",P), 
with E denoting the expectation operator and Var the variance. Finally, denotes convergence in 
distribution. 

2 Framework 

2.1 Algorithm and Set-Up 

We consider the scenario when one wishes to sample from a target distribution with density II on E d 
(E C R) with respect to Lebesgue measure, known point-wise up to a normalizing constant. In order 
to sample from II, we introduce a sequence of 'bridging' densities which start from an easy to sample 
target and evolve toward II. In particular, we will consider the densities: 

n n (x) oc n(x) 0n , x e E d , (l) 

for < 0o < • • • < (pn—i < 4>n < • ■ • < (ftp = 1. Below, we use the short- hand T n to denote un- 
normalized densities associated to II n . 

One can sample from {II„} using an SMC sampler that targets the sequence of densities: 

n-l 

Tl n {xi:„) = n n (l- n ) Y\_ L j( X j+U X j) 

i=i 

with domain [E d ) n of dimension that increases with n = 1, . . . ,p; here, {L n } is a sequence of artificial 
backward Markov kernels that can, in principle, be arbitrarily selected ( [T3j ) • Let {K n } be a sequence 
of Markov kernels of invariant density {II n } and T a distribution; assuming the weights appearing in 
the statement of the algorithm are well-defined Radon Nikodym derivatives, the SMC algorithm we 
will ultimately explore is the one defined in Figure [1] It is remarked that our analysis is not necessarily 
constrained to the case of resampling according to ESS. 
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0. Sample Xq , . . . X^ i.i.d. from T and compute the weights for each particle i £ {1, . . . , N}: 
Set n = 1 and 1 = 0. 

1. If n < p, for each i sample X l n \ X^ l _ 1 = x l n _ l from K n (x l n _ 1 , ■) and calculate the weights: 

W l--n = r„_i(4_!) W l:(n-1) ■ 

Calculate the Effective Sample Size (ESS): 

ESS lm (N) = gf;g"jC . (2) 

If ESS hn (N) < a: 

resample particles according to their normalised weights 

set I = n and re-initialise the weights by setting w\. n = 1, 1 < i < -/V; 
let x^, ... , x^ now denote the resampled particles. 
Set n — n + 1 . 

Return to the start of Step 1. 

Figure 1: The SMC samplers algorithm analyzed in this article. 

For simplicity, we will henceforth assume that T = n . It should be noted that when T is different 
from IIo, one can modify the sequence of densities to a bridging scheme which moves from T to II. 
However, in practice, one can make IIo a s simple as possible so we do not consider this possibility; 
see [30] for more discussion and analysis when d is fixed (note that our results for SMC samplers will 
hold, with some modifications, also in this scenario). Note, that we only consider here the multinomial 
resampling method. 

We will investigate the stability of SMC estimates associated to the algorithm in Figure Q] To 
obtain analytical results we will need to simplify the structure of the algorithm. In particular, we will 
consider an i.i.d. target: 

d 

n(z) = Y[ n ( x j) ; n ( x j) = ex p{ff(^)} . ( 4 ) 

i=i 

with xj 6 E, for some g : E i— > R. In such a case all bridging densities are also i.i.d.: 

d 

n„(x) oc ir n (xj) ; TT n (xj) oc cxp{(j) n g(xj)} . 
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It is remarked that this assumption is made for mathematical convenience: see [5] for a discussion 
on this. A further assumption that will facilitate the mathematical analysis is to apply independent 
kernels along the different co-ordinates. That is, we will assume: 

d 

K n (x,dx') = Y[ knix^dx'j) , 

3=1 

where each transition kernel k n (-, •) preserves 7r n (a;); that is, 7r„fc„ = n n . We study the case when one 
selects cooling constants <f> n = <j) n (d) and p = p(d) as below: 

p = d; n (= n , d ) = 0o + , 0<n<d, (5) 

with < 0o < 1 given and fixed with respect to d. It is possible, with only notational changes, to 
consider (as in [3D]) the case when the annealing sequence is derived via a more general non-decreasing 
Lipschitz function; see Section 13.2.11 As in [3] , it will be convenient to consider the continuum of 
invariant densities and kernels on the whole of the time interval [0o, 1]. So, we will set: 

ir s (x) oc 7r(a;) s = exp{s g(x)} , s £ [0 , 1] ■ 

Similarly k s (x,dx') with s £ (0o, 1] is the continuous-time version of the kernels k n {x,dx r ). As in [3J, 
the mapping ld(s) ~ [ is used to move between continuous and discrete time. 

2.2 Conditions 

We state the conditions under which we will derive our results. We will require that E C R with 
E being compact. The conditions below correspond to a simplification of the weaker conditions in 
[3j under the scenario of the compact state space E that we consider here. We note that imposing 
compactness has been done mainly to simplify proofs and keep them at a reasonable length. The 
numerical examples later on are executed on unbounded state spaces, and do not seem to invalidate 
our conjecture that several of the results in the sequel will also hold on unbounded spaces under 
appropriate geometric ergodicity conditions, as it was the case for the stability results as d — > oo in 
[3J. We remark that all results of [5] also hold under the assumptions stated here. 

(Al) Stability of {k s } - Uniform Ergodicity. 

There exists a constant 6 € (0, 1) and some <^ £ £?{E) such that for each s £ (0o, 1] the state- 
space E is (1, 8, <;)-small with respect to k s . 

(A2) Perturbations of {k s } . 

There exists anM<oo such that for any s,t £ (0Oj 1] we have \\k s — k t \\t v < M \s — 1\ . 

Note that the statement that E is (1, 9, <;)-small w.r.t. to k s means that E is a one-step small set for the 
Markov kernel, with minorizing distribution ? £ &(E) and parameter 6 £ (0, 1) (i.e. k s (x, A) > 9<;{A) 
for each (x, A) £ E x S). 
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In the context of our analysis, we will consider an SMC algorithm that resamples at the deterministic 
times t\{d), . . . , t m *(d)(d) 6 [<f>o, 1] (i.e. resamples after n = ld{tk{d)) steps for ft = 1, 2, ... , m* (d)) such 
that t (d) = (j) and t (d) < ti(d) < ■■■ < t m , {d) (d) < t m , {d)+1 {d) = 1, with l d (t m .(d)) < d. We 
will also assume that as d — > oo we have that m*(d) —> to* and tfc(d) — > ifc for tfc 6 [<po, 1] for all 
relevant ft. Such deterministic times are meant to mimic the behaviour of randomised ones (i.e. as for 
the case of the original algorithm in Figure [!} and provide a mathematically convenient framework 
for understanding the impact of resampling on the properties of the algorithm. Examples of such 
times can be found in [3l 1 1 3j ; the results therein provide an approach for converting the results for 
deterministic times, to randomized ones. In particular, they show that with a probability converging 
to 1 as N — V oo the randomized times essentially coincide with the deterministic ones. We do not 
consider that here as it would follow a similar proof to [3] , depending upon how the resampling times 
are defined. (An alternative procedure of treating dynamic resampling times is to use the construction 
in PQ; this is not considered here.) For simplicity, we will henceforth assume that d is large enough so 
that m*(d) = to*. 

2.3 Log-Weight-Asymptotics 

Given the set-up ((5|) and the resampling procedure at the deterministic times tx(d), . . . , t m - (d) £ [<f>o, 1], 
and due to the i.i.d. structure described above, we have the following expression for the particle weights: 

d 

i=i 

where G l k j = (1 - cf> ) Enitfe-Tw)) 9( X n.j) for 1 < i < N. The work in [3] illustrates stability of the 
normalised weights as d —> oo. Define the standardised log-weights: 

ld(t k (d))-l 

= (1 - <h) E (s(X l n>j ) - E 7rtfe _ i(d) [g{X i n j ) ] ) . (6) 

n=l d (tk-l(d)) 

The notation ^7r tk l(d) [g{^n j) ] refers to an expectation under the initial dynamics Xf,^ j ~ 
7Tijt_i(d)i after that, j will evolve according to the Markov transitions k n . We also use the notation 
E 8»d [ • ] when imposing similar initial dynamics, but now independently over all co-ordinates and 
particles; such dynamics differ of course from the actual particle dynamics of the SMC algorithm. In 
what follows, we use the Poisson equation: 

g(x) - TT u (g) = g u {x) - k u (g u )(x) 

and in particular the variances: 

al t = (1 - <t>o) J Mdl ~ k u {g u ) 2 )du , fa < s < t < 1 . (7) 
The following weak limit can be derived from the proof of Theorem 4.1 of [3J- 
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Remark 2.1 (Log- Weight- Asymptotics). Assume (Jfjj^) and g G Bt,(E). For any N > 1 we have: 



3 Etf*A\ =>(*)£i. 

3=1 

where the Z l 's are i.i.d. copies from N(0,at k l . tk ). 

The result illustrates that the consideration of 0(d) Markov chain steps between resampling times 
stabilise the particle standardised log- weights as d — > oo. 

3 Main Results 

We now present the main results of the article. 

3.1 Asymptotic Results as d — > oo 
3.1.1 L 2 -Error 

The first result of the paper pertains to the Monte-Carlo error from estimates derived via the SMC 
method. We will consider mean squared errors and obtain L,2-bounds with resampling carried out also 
'at the end', that is when one resamples also at time t = 1. Below, recall that the X-notation is for 
resampled particles. Resampling at time t = 1 is required when one wishes to obtain un-weightcd 
samples. We have the following result, with proof in Appendix IA. 21 

Theorem 3.1. Assume (A[MB and 9 <= Bb{E). Then for any N > 1, ip e Cb(E) we have: 



N 2 

( £ £ [ ¥>(*3,i) - <<P) ] ) a < Var^] i (1 + M(a tm , 



17 at 



for 

M(a 2 tm ,_ i:1 ) = e<--^+Me l - ^ 
with M < oo independent of N and of m :1 . 

Remark 3.1. Compared to the i.i.d. sampling scenario, the upper bound contains the additional term 
Var^c/?] M(o~t „ This is a bound on the cost induced due to the dependence of the particles. 

3.1.2 Normalizing Constants 

The second main result of the paper is the stability of estimating normalising constants in high dimen- 
sions. The quantity of interest here is the ratio of normalising constants: 

Cd J E * K{x)dx ^ 
' d ' J E 4 n^ {x)dx ' 
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We first consider the SMC sampler in Figure [T] with no resampling. Define: 

N N 



N 



7d 

i=i ?:=i 

where G] = (1 — cf>o) X)™=o sKj)- From standard properties of SMC we have E [ 7^ (1) ] = 7<j(l) = c<j. 
Now, consider the relative L2-error: 

V 2 ( 7rf (l))=E[(|«-l) 2 ] . 

We then have the following result, proven in Appendix IA. 31 

Theorem 3.2. Assume (Al-2) and g e Bb{E). Then for any N > 1: 



lim V a ( 7 d(l)) - 

The result establishes a C(iV _1 ) rate of convergence at a computational cost of 0(Nd 2 ). The infor- 
mation in the limit is in terms of the expression o'^ )Q . 1 . As in [3J, this is a critical quantity, which helps 
to measure the rate of convergence of the algorithm. 

We now consider the SMC sampler in Figure [T] with resampling at the deterministic times {tk(d)} 
described in Section ¥Z7Z[ We make the following definitions: 

jV 

where k S {1, . . . , m* + 1} and G^ . as defined in Section [2.3l As in the non-resampling case, we again 
have the unbiasedness property for the estimate of c^: E [ nl-Li" 1 IdkO-) ] ~ TlfcLi^ 1 7d,fc(l) = c d ■ We 
have the following result whose proof is in Appendix IA.3I 

Theorem 3.3. Assume (OCDUP and g £ B b {E). Then for any N > 1: 

m* + l m* + l 

hm V 2 ( [] 7d,fc(l)) = e"<- [I [^ e2< - i:tfc + (!- e^" 1 ' 1 * ] - 1- 
fe=i fe=i 

Remark 3.2. In comparison with the no resampling scenario, the limiting expression here depends 

upon the incremental variance expressions. On writing the limit in the form: 

m* + l 



e <*o:i 



[ e "t fc -i.«*[i + £{ e "«»-i** -1}] -1 



fc=i 



if N > (m* + l)(e CT — 1), wii/i c 2 = maxfc er 2 fc _ 1 . tfc , i/ien using similar manipulations to J3J Corollary 
5.2], we have that 

i. m v 2 ("ff^(i))< 2,m ' +1,(eFl - 1) 



"\ 11 - TV 

fc=i 



hence, the no resampling scenario has a lower error if this is less than the limit in Theorem \3.2\ The 
upper-bound also shows that the error seems to grow with number of times one resamples. The limit in 
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Theorem \3.S\ corresponds to the behavior o/E ®«d [7dfc(l) 2 /7c(,fc(l) 2 ] between each resampling time. 
In effect, the ergodicity of the system takes over, and breaks up the error in estimation of the ratio of 



normalizing constants to different tours between resampling times ( see Proposition \3.1 



To provide some intuition for Theorem 13.31 we give the following result. It is associated to the 
asymptotic independence, between resampling times, of the log-weig 

hts Ei=i 3 G k,j in ©• The P roof 

of the following can be found in Appendix IA.3I 

Proposition 3.1. Assume (OQO^ and that g £ Bb(E). Then for any N > 1, i £ {1, . . . , N} , c 1:k £ M. 
and k £ {1, . . . , m* + 1}, we have that: 

k 

d— >c 



lim E [ e E « c < 3 E?=i G tj 1 = TT E [ e c ' z ' } 

Z=l 

where Z l ~ -/V(0, cr| . t ) independently over 1 <l < k. 



3.1.3 Propagation of Chaos 

Finally we deduce a rather classical result in the analysis of particle systems: propagation of chaos, 
demonstrating the asymptotic (in the classical setting) independence of any fixed block of q of N 
particles as N grows. The following scenario, with the SMC sampler with resampling (at the times 
{tk(d)}), is considered: let s(d) be a sequence such that s(d) £ (tk-i{d), tk{d)) for some 1 < k < m* + l, 
with limit ,s £ {tk^i{d),tk(d)). Denote by the marginal law of any of the q particles out of TV 

at time s(d) and in dimension j £ {1, . . . ,d}. By construction, particles are considered at a time 
when they are not resampled. We have the following Propagation of Chaos result, whose proof is in 
Appendix IA.4I 

Proposition 3.2. Assume (JfJS^) and that g £ Bb(E). Then, for any fixed j £ {1, ,..,d} and any 

1 < k < m* + 1, a sequence s(d) £ (tk-i{d), tk{d)) with s(d) — > s, and any N > 1, with 1 < q < N 
fixed: 

lim ||P C ?^ ) .-7rf«|U = 0. 

The result establishes the asymptotic independence of the marginals of the particles, between any 
two resampling times, as d grows. This is in contrast to the standard scenario where the particles only 
become independent as N grows. Critically, the MCMC steps provide the effect that the marginal 
particle distributions converge to the target irf q . Thus, Proposition ^. 2l cstablishes that it is essentially 
the ergodicity of the system which helps to drive the stability properties of the algorithm. It should 
be noted that if one considers the particles just after resampling, one cannot obtain an asymptotic 
independence in d. Here, as in classical results for particles methods, one has to rely on increasing N. 
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3.2 Other Sequences of Densities 

In this section, we discuss some issues associated with the selection of the sequence of chosen bridging 
densities {II n }. 

3.2.1 Annealing Sequence 

Recall that we use the equidistant annealing sequence cf> n in ([5]). However, one could also consider a 
general differentiable, increasing Lipschitz function <f>(s), s G [0, 1] with 0(0) = <po > 0, 0(1) = 1, and 
use the construction <p r ^d = 4>( n /d)', then the asymptotic result in Theorem 12.11 (and others that will 
follow) generalised to the choice of § n ,d considered here would involve the variances: 



2.0 f ,~2 , I- \2\ \ d<Ku) 

<T S ;t = / ^(«)(50(«) -«0(u)W(«)J ) —fa- 



d<j)(u) , < s < t < 1 , (9) 



in the place of a 2 s . t in ([7]). Our proofs in this paper are given in terms of the annealing sequence ([5]), 
corresponding to a linear choice of 4>{-), but it is straightforward to modify them to the above scenario. 

This point is illuminated by our main results. For example Theorem 13.21 helps to compare various 
annealing schemes for estimating normalizing constants, via the limiting quantity Q. That is, if we are 
only concerned with variance one prefers an annealing scheme which discretizes 4>{s) versus one which 
discrctizes v(s) (differentiable monotonic increasing Lipschitz function with ^(0) = vq > 0, i>(l) = 1) 
for estimating normalizing constants if 

2,0 2,1/ 

e'o.i -1 < e"a.i -1 , , 2,0 < 2,v 

N — N °0:1 — "0:1 ' 

In practice, however, one has to numerically approximate <7g.'f and crg.'J', lessening the practical impact 
of this result. Similar to the scenario of Theorem 13. 2\ one can use Theorem 13.31 to compare annealing 
schemes </>(s) and v(s) (which potentially generate a different collection and number of limiting times 
< tf < ■ ■ ■ < tf n » < 1, to* and < t\ < ■ ■ • < t v m , < 1, m* respectively) via the inequality 

m J + l 2 2 '* 2 '* r «v + l 2 v 2 v 

e-S.'f n [±e <t-i*t+!&e*t-i<]<e-% ]J [ie^^^/'t.i 

k=l k=l 

but again, both quantities are difficult to calculate. 
3.2.2 Data Point Tempering 

An interesting sequence of densities introduced in [9] arises in the scenario when LT is associated with a 
batch data-set JJi, ■ ■ ■ ,y p - The idea is to construct the sequence of densities so that arriving data-points 
are added sequentially to the target as the time parameter of the algorithm increases. More concretely, 
we will assume here that: 

p d 



II(ar) oc exp I g{yk, Xj) > , x ■ e E , 

^=17 = 1 J 
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that is a density that is i.i.d. in both the data and dimension. In this scenario one could then adopt a 
sequence of densities of the form: 



{n d ^ 
k=l 3 = 1 ' 



As noted also in Remark 3.5 of [3], for d — > oo one cannot stabilize the associated SMC algorithm 
(described in Figure [lj as the ratio II„ + i/II„ explodes for increasing d. To stabilize the algorithm 
as d grows one can insert [d/p\ annealing steps between consecutive data points, thus forming the 
densities: 

n£°(aO cxexp L( J Jj ¥J )J29(y n ,x j )\ xMi) , ne{l,...,p}, k £ {1, . . . , |j J } , 

I. 3=1 > 

where <fi(s) is as in Section [3.2.11 with <j) a = 0. Then one can adopt Markov kernels K^ n \ 1 < n < p, 
of product form with each component kernel ki having invariant measure 

7r< n) (x) oc cxp I s ■ g(y n , x) + g(y k , x) \ , x £ E . 
^ fe=i ' 

We consider the scenario where there is no resampling and denote by ESS( 0j d)(n, TV) the effective 

sample size with n data after [d/p\ steps of the n-th SMC sampler. Throughout the data are taken 

as fixed. We have the following result, which follows directly from Theorem 3.1. of [3] and illustrates 

the stability of ESS( ,d) (n, TV) as d —> oo. 

Proposition 3.3. Assume conditions (j$7^) for the kernels ki, n S {1, ...,p}. Suppose that for 
each data index n £ {1, . . . ,p\, g(y n , ■) £ Bb(E). Then, for any fixed TV > 1 and n > 1, ESS(o.ci)(n, TV) 
converges in distribution to: 

(Ef=ie^) 2 



where Z l TV(0,cr 2 ) with 

- 2 - V / ir {k) ((a {k) ) 2 - k {k) (a (k) ) 2 ) 



a' 



fe=l 



E-li ^ 

d<j){u) 



In particular, 



lim E[ESS (0 . d) (n,TV)" 

d— yoo 



E 



e 



z' 



d(f)(u) 



As for the case with annealing densities, there is a direct extension to the case where one resamples 
(see [3]). In addition, one can easily extend the results in Section [3] in the data-point tempering case 
examined here. In connection to the filtering scenario, this is a class of densities that falls into the 
scenario of a state-space model with a deterministic dynamic on the hidden state (i.e. only the initial 
state is stochastic and propagated deterministically; see e.g. [7]). This hints at algorithms for filtering 
which may stabilize as the dimension grows. As we shall see in Section O it is not straight-forward to 
do this. 
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4 Numerical Simulations 

We now present two numerical examples, to illustrate the practical implications of our theoretical 
results. It is noted that the state-space E is not compact here, yet the impact of our results can still 
be observed. 

4.1 Comparison of Annealing Schemes 

We consider a target distribution comprised of d i.i.d. -/V(0, 1) co-ordinates. The bridging densities are 
in this case: 

7T0( s) (x) oc exp { - \4>{s)x 2 } . (10) 
We will in fact consider two annealing schemes: 

<t>(s) = <t>o + (1 - 0o)s ; 

"(«) = *£=r + (§^)e*'. (ii) 

These are graphically displayed in Figure [5] (a), with i? = 5. 

The purpose of investigating the two annealing schemes is as follows. In practical applications of 
SMC samplers we have observed that algorithms with slow initial annealing schemes can often out- 
perform those with faster ones (see Figure [2] (a)). Thus, we expect scheme v(s) to perform better 
than (j){s) w.r.t. the expression for the asymptotic variance ([9]), hence deliver a lower relative L2- 
error for the estimation of the normalizing constant in high dimensions. To obtain some analytically 
computable proxies for the asymptotic variances (|9]) we use the variances that one would obtain when 
k s (x,dx') = Tr s (dx'), that is we substitute TT^ u) (g 2 ) - Tr^ u )(g) 2 for ^(u) (<^ (u) ~ ^(«)(?0(u)) 2 ) in ®- 
In this scenario it is simple to show that, under the choice ([TO]) , we have that: 



/■ 


1 d(j)(u)~ 


/o 


4>(u) du 



du . 



Figure [2] (b) now plots the analytically available variances a 2 '.f and a 2 '." (broken line) against (f>Q. The 
graph indeed provides some evidence that that the scheme v(s) should give better results. This is 
particularly evident when tfio is small; this is unsurprising as one initializes from IL$ , hence if (f>o is 
closer to 1 one expects a constant increase in the annealing parameter to be preferable to a slow initial 
evolution. 

We ran SMC samplers with both annealing schemes with N = 10 4 particles and different dimension 
values d £ {10,25,50}. The choice <j)o = k is used for both annealing schemes. We used a Markov 
kernel k s (x, dx') corresponding to a Random- Walk Metropolis with proposal y = x + N(0, 5^)' * nus 
the proposal variance is 1/25 times the variance of the starting distribution of the bridge N(0, 4^); this 
is a choice that gave good acceptance probabilities over all d bridging steps of the sampler. Multinomial 
resampling was used when the effective sample size dropped below -y. We made 50 independent runs 
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Dimension d 


10 


25 


50 


Ratio of Variances (for (j)(s) over v(s)) 


2.32 


3.47 


7.05 



Table 1: The empirical variances (oven 50 independent runs for the SMC sampler) of the log Ratio 
(fT2|) using the annealing <fi(s) over the corresponding variances for the annealing sequence v{s). 




20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 

s 

(a) Annealing Schemes, 0o = 0.01 (b) Asymptotic Variances 



Figure 2: Panel (a): Plot of the annealing parameters 4>(s) and i/(s) (broken line) defined in (JTTJ) 
against time when c/)q = 0.01 and i} = 5. Panel (b): A Plot of a^lf and <f^:\ (broken line) against <f>o 
for the case of exact sampling with k s (x, dx') = ir s (dx') and the scenario (fTT)|) . 

of the algorithm, and calculated the corresponding realisations of the log Ratio: 

m*(rf) + l 

fc=i 

(note that now the resampling times and their number are random) and their sample variance. This 
experiment was carried out for choices of dimension d £ {10, 25, 50} and for both annealing schemes 
</)(s) and v{s). The ratio of the obtained variances for the annealing sequence <f>(s) over z/(s) are shown 
in Table [T] The results confirm our theoretical findings above for the superiority of v{s) over (j){s) 
based on the analytical expression for the asymptotic variance even for moderate d. 

4.2 Bayesian Linear Model 

We now consider the implications of main results in the context of a Bayesian linear regression model 
(see [T5] for a book- length introduction as well as a wealth of practical applications). This is a statistical 
model that associates a p-vector of responses, say Y , to a p x d-matrix of explanatory variables, X , 
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Time- Steps 


d 


bd 


lOd 


Relative Error 


4.75 


4.47 


3.9 



Table 2: The mean square error when estimating E[/?i F] for the (annealed) SMC sampler over 100 
repeatitions relative to i.i.d. sampling. 1000 particles are run and we resample at the final time step. 
The error is calculated for different choices of number of time-steps for the SMC sampler. 

for some p > 1, d > 1. In particular: 

Y = X/3 + e 

where (3 is a d-vector of unknown regression coefficients and e ~ N p (0, with l p the p x p identity 
matrix. A prior density on (3 is taken as Nd(0, Id) which yields a posterior density found to be the 
d-dimensional Gaussian Nd{{ld + X'X)~ 1 X'Y, (Id + X'X)^ 1 ) where X' denotes transpose. This is 
the target distribution for our SMC sampler. 

The objective is to investigate the bound in Theorem 13. II and the implications of Proposition 13.21 
Note that the target distribution is not of product structure here. The data-point tempering method 
(see Section [3.2.2[) is also compared with annealing. We consider the case d = 50, p = 50 with N = 10 3 ; 
the data are all simulated. The annealing scheme v in (fTTj) is adopted as well as the data-point 
tempering method with [lOd/pJ steps between the p data point arrivals. Particles are propagated 
along the bridging densities via Markov kernels corresponding to Random- Walk Metropolis within 
Gibbs: the proposal for a univariate co-ordinate x conditionally on the rest is y = x + N(0, j^). 
Dynamic resampling according to the ESS is employed (threshold ^) as wcu as resampling at the 
last time step (see Theorem 13. 1|) . For the annealing scheme the number of SMC steps is scaled as a 
multiple of d). This increase at the number of time steps aims at illustrating the propagation of chaos 
(Proposition 13. 2j) . We fixed d = 50 for computational cost considerations, but the SMC algorithms 
will easily stabilize for much larger d. 

Each SMC method employed is repeatedly applied 100 times. We calculate the mean square error 
for the estimation of E [/3i \ Y] (analytically available here) over the 100 replications and we compare 
with the corresponding error under i.i.d. sampling of the posterior of the results are reported in 
Table [21 In the table we can observe the increase in mean square error of the (annealed) SMC algorithm 
to i.i.d. simulation. The increase here is not substantial as indicated by Theorem 13.11 although one 
may need to take d very large (and have an i.i.d. target) before the bound in Theorem 13. II is realized. 
As the number of time-steps increases we can observe an improvement. This is due to increase in 
diversity of the population, which improves the SMC estimate even when resampling at the end. For 
the data-point tempering method (the CPU time is roughly comparable with the case of 10d time- 
steps of the annealed SMC) the corresponding value of the relative mean square error is 7.5, which is 
slightly worse than the annealing scheme. In general, it is difficult to draw a definite conclusion on 
which scheme may be better. 
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5 Filtering 

An important application of SMC methods is filtering. In the following we will look at the effect of 
dimension for several filtering algorithms. 



5.1 Set-Up 

Consider the discrete-time filtering problem; we have fixed observations yi, . . . ,y n , with y k £ R » and 
a hidden Markov chain Xi, . . . , X n , with Xf. S E d such that the YJ-'s are conditionally independent of 
all other variables, given X k . We assume that the density w.r.t. Lebesgue measure is 

d 

g{yk\xk) = expi *S2h(yk,Xk,j) } (13) 



cxp j X^( yfc '' 

L .7 = 1 



3- 

with Xk.j 6 E and h : R d « x E —> R. The hidden Markov chain is taken to be time-homogeneous with 
transition density w.r.t. Lebesgue measure: 

d 

F(xk\xk-i) = Y\_ f( x k,j\xk-i,j) , k > 1 , 

where xo = (xo,i,xo,2, • ■ • j^o.d) £ is a given fixed point and J E f(x'\x)dx' = 1. This is certainly a 
very specific model structure chosen, as in the previous part of the paper, for mathematical convenience. 
Clearly, our results depend on the given structure; it is certainly the case that for some other classes 
of state-space models standard SMC methods could stabilize with the dimension of the problem (as 
could be the case for instance when the lihelihood in (fT3"|) involved only a few of the co-ordinates of x). 
The objective in filtering is to compute for a 7r-integrable function ip: 

^W{X n )\yi:n]= \ l P{x)lTn{x n \yi.. n )dx n (14) 
JE 

where 



S E (n-i)d IlLi g(yk\xk)F(x k \x k -i)dx 1:n -i 



Ie"* rife=i g{yk\xk)F{xk\x k -i)dx 1 .. n 
It should be noted that one can re-write the filter via the standard prediction-updating formula: 

g(y n\Xn)^n\n— 1 [Xn \y\in— l) 



^n(Xn\yi:n) 



Kn\n-l(x n \yi:n-l) = / F{x n \x n -i)'K n -l{Xn-l\yi:n-X)dx n - 1 
JE d 



p(y n \yi:n-i) = 



p{yn\yx:n-l) 

■En \%n— 1 )^"n— '. 
Je»* llLl g(yk\Xk)F(Xk\Xk-l)dx 1: r 



Je^-v* rife=i g(yk\xk)F(x k \xk-i)dx 1 .. n - 1 
Typically, one cannot calculate (fT4"]) . so we resort to particle filtering. The most basic approach is 
to perform an SMC algorithm, which approximates the sequence of densities 

n 

7r(xx :n \yi:n ) K H g(yk\xk)F{x k \x k -i) 

k=l 
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by using the prior dynamics, characterised by F, as a proposal. This yields an un-normalized incre- 
mental weight at time step n of the algorithm which is g(y„\x n ). One can then resample or not. We 
consider the scenario as the dimension of the state increases, when the data record is fixed (i.e. we 
keep the time parameter n and data fixed). 

In reference to the works [21 HI El EE] , it is clear that the standard particle filter cannot be used 
in general to approximate filters with high-dimensional states. An alternative, as mentioned for the 
data-point tempering scenario in Scction [3.2.2l (see also [4l [18] ) is to insert an annealing SMC sampler 
between consecutive filtering steps, updating the entire trajectory x\- n G E nd . 

Assuming the i.i.d. structure above, the model dynamics decompose over d independent co-ordinates. 
One could use MCMC kernels for the SMC samplers between arrivals of consecutive data-points, with 
each univariate kernel (the product of d of them forming the complete kernel) having invariant density: 

n-1 

ir^{xi-n) oc exp j(/>(!) h{y n ,x n ) + log(/(x„|x„_i)) + ^{% if aij) + \og(f(x i \x i _ 1 )) } j , 

i=l 

for Xi :n € E n , (f)(0) = 0, and < p < d. We write these marginal target densities at data-time n 
on the continuum as tt^ with associated Markov kernels fci"^ (which operate on spaces of increasing 
dimension). No resampling is added to the algorithm, but easily could be. The ESS (see (EJ) in 
Figured]) is denoted ESS(o,<i)(ji, N). It is a straight-forward application of Theorem 3.1 of [3] to get 
that, under assumptions (A[I]E1) for the kernels k^ n \ n > 1, and the condition that for each n > 1, 
h(y n ]Xn) £ Bb(E) (with the associated solution to Poisson's equation written c/s), for any fixed 
N > 1, n > 1, ESS( .d)( n i A) converges in distribution to 

[e£i« z< ] 2 

where Z 1 ' A(0, a 2 ) with 

cr2 = / ^>(i)((^>(i)) 2 ~ fc 0(i)(50(i)) 2 ) 



fc=l 



d(f>(u) 
du 



d(p(u) 



In particular, 



lim E[ESS (CM) (n,A)" 



i=l 

2^2/ 



(15) 



Thus the cost for this algorithm to be stable as d — > oo, is 0(n d N). This result is not surprising; one 
updates the whole state-trajectory and the stability proved in [3] is easily imported into the algorithm. 
However, this algorithm is not online and will be of limited practical significance unless one has access 
to substantial computational power. The result is also slightly misleading: it assumes that the MCMC 
kernels have a uniform mixing with respect to the time parameter of the HMM (see condition AfT]). 
This is unlikely to hold unless one increases the computational effort, associated to the MCMC kernel, 
with n. 
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One apparent generalization would be to use SMC samplers at each data-point time to sample 
from the annealed smoothing densities, except freezing the first n — 1 co-ordinates; it is then easily 
seen that one does not have to store the trajectory. However, one can use the following intuition as 
to why this procedure will not work well (the following is based upon personal communication with 
Prof. A. Doucet). In the idealized scenario, one samples exactly from the final target density of the 
SMC sampler. In this case, the final target density is exactly the conditionally optimal proposal (see 
|16j ) and the incremental weight is: 

/ g{y n \Xn)F{x n \x n ^i)dx n = TT / e h{ - Vn,Xn - s ' ) f{x n> j\x n -i t j)dXn,3 , 

which will typically have exponentially increasing variance in d. We conjecture that similar issues arise 
for advanced SMC approaches such as [ID] . 

5.2 Marginal Algorithm 

Due to the obvious instability of the above procedure, we consider another alternative, presented e.g. 
in [27]. This algorithm would proceed as follows. When targeting the filter at time 1, one adopts 
an SMC sampler, which as discussed before, will stabilize with the dimension. Then at subsequent 
time-steps, to consider the initial density of the SMC sampler: 

N d 

E^^n/Kf^,?- 15 ) (i6) 



where we have defined: 

J,(n-i) 



1=1 3 = 1 



d d-l N 



j=l i=0 1 = 1 

as the normalized weight. One could then resample and apply SMC samplers on the sequence of target 
distributions (e.g. with n = 2) 

N d 



(«)/ 



)(xcx P {0(^)^%„,x,)}[^n/(^i< ( r 1) )] . fce{o,...,d}, a?) 



j=l 1=1 j=l 



where x^" ^ is the resampled particle when sampling from ([T 

Using simple intuition, one might expect that this SMC algorithm may stabilize as the dimension 
grows. For example, if one could sample exactly from nf" (xi : d), then the importance weight is exactly 
1; there is no weight degeneracy problem: As proved in [3], under ergodicity assumptions, the SMC 
sampler will asymptotically produce a sample from the final target density. 

However, the following result suggests that the algorithm will collapse unless the number of particles 
grows exponentially fast with the dimension. Consider the case with Nd particles, where N depends on 
d here, and these are samples exactly from the previous filter; denote the samples X\:^ d . Conditionally 
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upon X^; d d , sample X]^. d d exactly from the approximation 7r 1 (fTT|) . This presents the most optimistic 
scenario one could hope for. Denote below, 1p n (x) = (f(x) — Tr n ((f). We have the following result. 

Proposition 5.1. Consider the algorithm above so that for any (x,x') £ E 2 we have f(x\x') £ (/,/) 
for constants < / < / < oo and for any (y,x) £ R dy x E, h(y\x) £ (h,h) for < h < h < oo. 
Suppose ip £ Bb(E). Then, there exist an M < oo and k > 1 such that for any d > 2, j £ {1, . . . , d} 
and iV<j > 1 we ftaue 



E 



iV, 



(n) 

Proof. Throughout the proof, write ir d as the approximated filter at time n. Then we have the simple 
decomposition: 



N d 



< 



Nd 



N d 



E 



E 



N,, 



1=1 " 1=1 

On applying the Ci— inequality, we can decompose the error into the one of the Monte Carlo error of 

(n) 

approximating expectations w.r.t. ir d and that of approximating the filter. 

Consider the first error. Conditioning on the i.i.d. samples drawn from the filter at time n — 1, 
one may apply the Marcinkiewicz-Zygmund inequality and use the i.i.d. property of the algorithm to 
obtain the upper bound: 



' M ' 



E 



■h Et d i fie h ){xi) r £ ZiJi f(^ h )(xi) w w f(e h )(xi) 



iE^n?=i/M(^) 

As and /i are bounded, it is easily seen that the function in the expectation is uniformly bounded in 
d and hence that one has an upper-bound of the form M/Nd- 
Now to deal with the second error, this can be written: 



-.Et^UUf^ixi) 



N, 



E 



]fe £ 4 =i f{^ h ){X)) f(e h )(Xi) n n ^f^e h )n n ^f(e h ) 



h\d-l 



*n-if(e h ) 



h\d 



i=l 1=1 



The bracket can be decomposed into the form: 

w: T,fJi f( i pe h )(x i j ) f(e h )(xi) 

{* n -if(e»)*}± nf =1 /(e h )(*/) 

1 / 1 Nd 

ir n -if(e h ) d \N d ^ ^ 

Applying the C2 — inequality again, we can break up the two terms. Using the lower bound on /i and 
upper-bounds on y> and /i the L2-error of the first term is upper-bounded by: 



Ml 



y( e h)2d e 2h 1^ 



^/(eY-^En^)^') 



i=l £=1 



Ml: 



A r d7rn-l/(e h ) 2 



-Var 



a 

[t[f(e h )(XD 



1=1 
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As the variance on the R.H.S. is easily seen to be equal to TT n ~i{f{e h ) 2d ) — n n _i(f(e h )) 2d one yields 
the upper-bound: 



_7r„-i(/(efc)) M 

For the second term one can follow similar arguments to yield the upper-bound: 



M 
N~ d 



1 



N d v n -i[f{e h )] 2d 
from which one can easily conclude. 



7r„_i[/(^)| 2 ]7r n -i [f{e h f]^ - ^ [/(^)] 2 tt„-i [/(e' 1 )] 2 ^ 



□ 



Remark 5.1. On inspection of the proof, it is easily seen that one can write the error as ^-(1 + K ) 
which represents two sources of error. The first is the Monte Carlo error due to estimating the marginal 
expectation w.r.t. the approximation. This appears to be controllable for any Nd converging to infinity. 
The second source of error is in approximating the filter, which seems to require a number of particles 
which will increase exponentially in the dimension; this is the drawback of this algorithm. 

Remark 5.2. We remark that this is only an upper-bound, but we can be even more precise; if one 
considers the relative 1.2-error of the estimate of p(y n \yun-i) then this is equal to 



1 



n n -i(f(e h ) 2 ) d 



1 



7r n _i(/(e^))^ 

which will explode in the dimension, unless Nd grows at an exponential rate. This is in contrast to the 
SMC sampler case in Section^ where one can obtain an estimate of the normalizing constant, whose 
relative L2 -error stabilizes for any N. 



5.3 Approximate Bayesian Computation (ABC) 

In this section we consider SMC methods in the context of an ABC filter - an approximate filtering 
scheme which is of practical interest when evaluation of the likelihood function in the state-space model 
is intractable. We note in passing the connection of the ABC methods to the ensemble Kalman filter 
[25] , a full treatment of the latter is well beyond the scope of the present work. 
The idea of this approach, which is primarily adopted when: 

• The function g{y\x\-.d) is intractable, that is, one cannot evaluate it point-wise. 

• It is possible to simulate from 9 ( • | x 1 : ^ ) for any x\ : d G E d . 

In this scenario, standard SMC methods can be used to sample from an approximation of the smoothing 
density, of the form (for some e > 0): 

n 

TT e (xi :n , UUn\yu n ) OC \\_l{ Uk ,\ yk ^ Uk \ <c }(u k ) g{u k \x k ) f {x k \x k -i) . (18) 
k=l 
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Here, the idea is to sample, at each time-point, pseudo-data 6 M. d «; the density is non-zero when all 
of the simulated pseudo data lie within e of the observed data (in Li-distance). Adopting an SMC algo- 
rithm with proposals g® f yields an un-normalized incremental weight of the form \i Uk .\ yh _ Uk \ <e \{uk), 
which circumvents the evaluation of g. In the context of high-dimensional models, as discussed here, 
the SMC algorithm will collapse using the approaches in the previous sections. However, when d y is not 
large, one would expect that indeed, the SMC approximation of the ABC filter should be reasonably 
stable (in some sense). We quantify this with the following result. 

We will assume here conditions (A1-A3) of [20]. In particular, that: 



sup||5(-,xi : d) 



g dsU Pm 1:d \\ h (^ X l:d)\\c 



sup\g(y,xi; d ) - g(u,xi :d )\ < M\y - u\ 



The latter assumption will typically only hold when E C R with E being compact. We consider only 
the scenario where no resampling is performed. The expectation with respect to the SMC algorithm 
conditioned on the fixed data (which is suppressed from the notation) is written as E. Also, we write 
simply I4 in the place of I r i-v w nn w / jw n i- 

Proposition 5.2. Given the set-up above, one has that for any n > 1, p > 1, <p 6 Bt(E), e > there 
exists an M(n,p,ip,e) > 0, and for d > 1 there exists K n (d,tp) > which does not depend upon p,e 
such that for any N > 1 : 

N TT" TT I. A \ V -I 1/p 



E 



E 



n 



k=l Hu k :\y k - 



-u fc |<e}(Wfc) 



Sj'=l Ilfe=l ^{u k :\y k ~u k \<e}{u{) 

where K n (d,(p), as d —> oo, converges to zero or diverges to infinity. 
Proof. Write 



(19) 



7r n,e(^) 



y(a; n ,i)7r e (a:i ;n ,ui : „|yi :T[ )da:i :n dui :r 



Then, one can add and subtract this term in the | • \ p and apply Minkowski, leading to: 



E 



N 



Uk=l l {u k :\y k -u k \<e}(ul.) 



1/p 



^1 Sj=l rifc=l I{« fc :|y fc - Ufc |<e}(Wfc) 

The first term is easily dealt with using standard proof techniques in Monte Carlo computation; see for 
example part of the proof of Theorem 3.3. of Hence we need only treat the bias term. Following 
the arguments of Theorem 1 of |20| . one can obtain that an upper bound on the bias is: 



K n (d, ip) 



( 2L + K n -i(d, <p)e dsup * IIMf".*)!!- ) 



^/ E 2 eMvn.*)/(x|x')7r„_i,i(a'|2/i :n _i)dx<ia;'J 

where tt^—x,! is the filter at time n — 1 marginalized to its first component and Ko(d, (p) = 0. 



□ 



Remark 5.3. The above result is of interest for high- dimensional filtering. Essentially it establishes 
that the SMC approximation of the ABC approximation is stable for any d > 1, with computational 
cost of O(Nd); this is the first term on the R.H.S. of (|19[) . However, the deterministic component of 
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the ABC approximation of the filter is likely to deteriorate as d —> oo. For any n > 1 the sequence 
(K n {d,<p))d>i is likely to diverge, as for example when n = 1 it is proportional to 

Whilst this is only an upper-bound, we will see in Section \5.4\ that the error seems to increase with d 
in simulations. 

Remark 5.4. Due to the link between ABC and EnKF f25^ and the bias of the EnKF f23\j . we 
conjecture that the EnKF will be subject to a similar behavior as for ABC (for non-linear models). 
That is, one can numerically approximate the approximation of the filter in high dimensions, but that 
the approximation collapses as the dimension of the state grows. 

5.4 Numerical Example: Linear Gaussian State-Space Model 

In the following example we consider the ABC approximation error of linear Gaussian state-space 
model, k > 1: 

Y k = HX k + V k ■ 
X k = X fe _a + W k , 

where d y = 1, H = (1, . . . , 1) is a 1 x d vector, V k N^O, 1), X = (0, . . . , 0)' and W k L ~ ' N d (0, l d ). 
200 data points are generated from the model. 

We run the SMC-based ABC algorithm in [20] with the parameter e (see (fT8|) for the approximation 
of the smoothing density) fixed at 5, with d € {10,40,200}. The first moment in the first dimension 
is estimated and the quantity on the L.H.S. of (flUf (associated to this) is estimated with N = 1000 
with 50 repeats. The results can be observed in Figure [ST4l where the estimate of the L.H.S. of (fT9l) 
over times 1 to 200 for d = 200 against d = 10 (black) and d = 40 (broken blue) is plotted. It appears 
that the error grows with d. It is remarked that in general as the model is not i.i.d. (in dimension) 
one cannot guarantee a uniform per-time step increase in the error. However, the relative increase in 
the error, w.r.t. dimension, is consistent with our empirical experience with applying the algorithm. 

5.5 Discussion on High-Dimensional Filtering 

One of the motivations of our work was to investigate the issue of stability in high dimensions of 
SMC algorithms for filtering. As can be seen, there is still much scope for future work and this issue 
is far from resolved. In particular and in relation to this issue (as established in Section 15. 3|) what 
is currently missing in the literature is a concerted effort from probabilists, statisticians and applied 
mathematicians on the analysis of algorithms used in data assimilation (at least one exception is [23] ). 

In relation to the above discussion, one potentially very fruitful starting point, is the filtering of the 
Navier stokes equation; see [5]. In this context, one is given the 2-dimensional Navier-Stokes equation 




23 




Figure 3: A Plot of the Relative L2-Error of the ABC Filter Algorithm. The black line is the IL-2-error 
for estimating the first moment in the first dimension for 200 dimensions against 10 dimensions over 
200 time steps. The broken blue line is similar except for 200 dimensions versus 40 dimensions. 

on a torus and the objective is to infer the initial condition of the PDE, given access to noisy data. In 
mathematical terms, given a Gaussian prior on x the initial condition, on Hilbert space % one seeks 
to deal with the density of the filter w.r.t. the prior (see Theorem 3.2]) 

Kk{x\yi-.k) « cxp{-$ fc (y 1:fc ,a;)} 

where x £ TL and : R fc x T-L — > R a potential function associated to observed data yi-.k £ R- This is 
an example of trying to filter a state with deterministic dynamics, albeit in infinite dimensions. Given 
the analysis in Section 15. 2. 2 [ it is likely that, by defining a data-point tempering SMC algorithm on the 
infinite dimensional space, one can consider the associated finite-dimensional algorithm (with state- 
vector in R d ) and establish the stability as d grows. This is assuming one has sufficiently mixing MCMC 
kernels; see [26] for some ideas. This issue is a subject of current research jointly with Prof. A. Stuart. 

More generally, one is interested in the issue of when the dynamics of the hidden state are Marko- 
vian. As noted here and in more details in [4], the underlying structure of the state-space model is 
important for establishing some sort of stability in high dimensions of a numerical filtering algorithm, 
SMC or otherwise. Akin to the standard problem of filter-stability in time (e.g. [29]), perhaps con- 
siderable research is needed with regards to dimension of the deterministic (true) filter, before a full 
analysis of numerical algorithms can be undertaken. However, it is not obvious how that analysis can 
be undertaken. 



6 Summary 

In this paper we have considered the stability of SMC methods in high-dimensions. In particular: the 
L 2 -error of marginal estimates, the L 2 -relative error of the normalizing constants and propagation of 
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chaos properties. The stability of some SMC-bascd filtering algorithms have also been investigated. 
Some directions for future work are as follows. 

Firstly, in the context of normalizing constants, one direction is the consideration of rare events 
problems. Following [8], it is possible to obtain computational complexity results for some rare events 
problems. However, one can pose some rare-events problems in terms of the dimensionality. Our 
results would, in many cases, not apply to this scenario and an extension to this case is important. In 
the case where one uses SMC samplers to sample from 'twisted' target densities (see [21]) the analysis 
adopted here can be applied. However, one would still need to verify that the path-sampling-based 
estimate will stabilize as d grows. 

Secondly, for normalizing constants, we have only considered the relative L,2-error. It would be 
of interest to consider e.g. logarithmic efficiency or higher-order errors. In addition, we have only 
considered one particular important functional that grows with d. More generally, when one can 
perform estimation with direct Monte Carlo, with a cost which is less than exponential in d, is it 
possible to do this also with SMC methods? 

Thirdly, and rather importantly, is it possible to find any online SMC algorithm to solve the filtering 
problem in general, whose cost does not increase exponentially in the dimension? At present, our only 
suggestion is the accept/reject scheme in [TTj. We are currently investigating the stability properties 
of this algorithm. It could be that in general, as noted above, one cannot obtain a stability result as 

in m- 

Finally, one could considerably weaken the the hypotheses made in this article. Given the number 
of exponential moments that we need to treat, it seems that multiplicative drift conditions |22| could 
be adopted; see [31]. 
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A Proofs 

A.l Preliminary Results 

We summarize in Lemmas IA.1I and IA.2I below some results required in the proofs obtained in [3] or 
implied directy from results in that paper. Recall the definition of G\ ^ from ([6]). 

Lemma A.l (G- Asymptotics) . Assume (A\TH^j and g G Bb{E). 
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i) Under the starting distribution ^} d ^ tk ^d)) 1 . d ~ 7T f l . N 1 (d) we ^ ave that: 

d 



zij PFe have that [ E^-i [ Gt • ] | < M, and for any p>2: 

! £i(*Js-l( d ))'J 

IC^I* <Md« vl . 

! d< f A,-l< d ))'J J 

Under either ftfj^f^ as in i) or the actual particle distribution we have that: 



: c a t 



iv) We have that: 

d 

iVEy, [Gl,l -+0 , in Li ; 

3=1 

Proo/. 

i) Both weak limit follows from the proof of Theorem 3.2 of [3J. Notice, that a minor difference 
is that instead of the fixed times </>o and 1 considered in Theorem 3.2 of [3] we now sum terms 
between the varying time instances tk-\(d) and t^(d). However, the proof for this case follows 
trivially from the proof for the fixed times due to the limits t^—x(d) — > ffc_i and t^{d) — > tf~. 

ii) All these results follow directly from Theorem A.l of [3J. 

iii) This follows from the CLT's in parts i) and ii) and the uniform integrability result obtained in 
Lemma IA.6I 

iv) The first result corresponds to Proposition C.4 of [3J. The second result is shown in the proof of 
Theorem 4.1 of [3]. 

□ 

Lemma A. 2. (Convergence of Marginal Laws) Assume (j$J$^) and g S Bb{E). Then we have: 

i) For a sequence of times s(d) £ (<po,l) with t^i(d) < s(d) and s(d) — > s G (£fc_i, 1) and the 
collection of time steps u(d) = (ld(tk-i(d)) + 1) : ld(s(d)) we have that as d — > oo: 

\\K(d){Xi d (t k _^d)) t i) -7rt h _i(d)fcu(<i)l|t« , in Li ; 
W^tk^dfiuid) -ir s (d)\\tv ->■ . 
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ii) For a sequence of times s(d) € (fa, I) with tj c -\(d) < s(d) and s(d) — > s G and the 

collection of time steps u(d) = Zd(ifc-i(^)) : ld(s(d)) we have that: 

(m 1:N X 1:N \ =± ( eZl N V 1:N \ 

\ w u(d)> A J d (s(d)),l J =^ I £« ie z< ' r / 

where {Z' 1 }^-^ are i.i.d. copies from N(0,a^ k _ i . s ) and, independently, {Y l }f =1 are i.i.d. copies 
from ir s . 

Proof. 

i) The first result follows by the proof of Proposition C.4 of [3J; the second result from Proposition 
A.l of [3J. 

ii) The weak convergence of the weights is analytically illustrated in the proof of Theorem 4.1 of [5J. 
The weak convergence of the positions of the Markov chain is proven in Proposition A.l of [3j. 
The independence between the Z v - N and Y VN limiting variables follows trivially from the fact 
that any single co-ordinate has a vanishing effect on the weights as d — > oo. 

□ 



A. 2 JL 2 -Error 

Proof of Theorem \3.1\ We begin by noting that, due to exchangeability of the particles: 



E 



N 
i=l 



1 EK^On + l^El^)^ 2 !)] (20) 



/v 



where we have set Tp{x) = ip(x)—Tr((p). Starting with the first term on the R.H.S. of (|20| . and averaging 
over the resampling time, one has 

N 
i=l 

where we have set u(d) ~ ld(t m *(d)) : d. Recall that w l u ^ denote the normalized weights. By the 
asymptotic independence result in Lemma lA.2f ii) we have that 



N N 

hm jf Y, E K (d) mxii) } 2 } = w E [zZ y£^ { ^ YI) }2 

i — 1 i— 1 



N 



where {Z t }fL 1 are i.i.d. from N(0, of : i) and, independently, V 1 ,...^ i.i.d. from 7r. We now 
look at the second term on the R.H.S. of ([20"]) . Averaging over the resampling index and invoking 
again the asymptotic independence result of Lemma IA.2f ii) we have: 

JV 



(d) w u(d) . 



N 



(21) 
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for random variables {Z l }fL 1 as denned above (in the last calculation we took advantage of exhange- 
ablity). We have the decomposition (writing ° 2 = a? . ;1 for notational convenience): 



We concentrate on the second term. Using Holder inequality we have: 



E 



(^_(SV=-)>) 



< E5 



N „Z l \3 



ElL^L\ 2 

N J 



Setting Z^ 1 ' := mini<i<jv Z % we get that (using also Cauchy-Schwarz) : 

E [ ygf^ ] < ^ E [ e 3 ^ ] < £ E* [ e 6 ^ ] E^ [ e" 6 ^ } . 

By standard results on order statistics the pdf of Z^ 1 ' is upper bounded by N times the pdf of N(0, a 2 ). 
So, we have that: 



El 



< Ne 



18a* 



By adding and subtracting e a in the summand and multiplying the square, one can use Minkowski 
and the Marcinkiewicz Zygmund inequality to obtain: 



E" 



< 



Me 



6a" 



ATl/2 



for some M < oo that does not depend upon N or a 2 . Putting together the above arguments, we have 
shown that the right-hand part of the R.H.S. of (|21[) . when d — > oo, is upper-bounded by the quantity 



Var w ((/?)(-^e CT + Me 17 " N \/ e ) which completes the proof. 



□ 



A. 3 Normalizing Constants 

Proof of Theorem \3. 6 A By the expression of the normalized variance (and the fact that the different 
particles are i.i.d.), one can re-center to rewrite: 

V 2 (7 d (D) = E[(#M) 2 ] 



with 



N 



l N d 



(1)4E e ~ d Ei=1 G) i 7^(1) = E [ G 3 ] j 



i = l 



where we have now set 



d-1 



G) = (1 - 4>o) (9«j) ~ E \9{K,j) ] ) 



n=0 



and iefl,..., N}, j G {!,..., d}. We have that: 



E 



= l-_a iy E[^(l)] + 3 ^E[73 r (l) 
^-l + ^E^l) 2 ] 



(22) 



(23) 
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where we have used the unbiasedness property (i.e. £[7^ (1)] = 7^(1)) of the normalizing constant, 
see e.g. QT]- We define Z\=\ Y? j=1 G) for G) defined in (JUJ) and 1 < i < N. Thus, due to Z^s 
being i.i.d., we have: 

E[^(i) 2 ] = ^[^] + (i4)E 2 [^]- 

By Lemma I A. lf iii) . applied when tk-i(d) = 4>o and tk(d) = 1, one has that: 
E[e 2 ^ ] cx P {2 ( t2 o:1 } ; E[e z i] ->> cxp^a 2 ^} . 
Using these limits in (|23|) and recalling also that 7^(1) = E [e Zd ], gives the required result. 
Proof of Theorem \3.3\ . Denote: 



□ 



N 



ilk 



(24) 



for the standardised Ctj. • in ([5]). We look at the relative L2-error: 



m* + l 



v 2 ( n 7d, fe (i) 



E 



m *+i 
TT T^W 1 N 
11 7 d . fc (l) 1 



fc=l v A;=l 

Using the unbiased property of normalising constants, see e.g. we have: 



m*+l 



E 



[< n 1 



k=l 



m*+l _ 

[n 1 



7d, fc (l) 2 



1 



For notational convenience, we set 
-£,(i) 2 



A fc 



■9 : ^i- w : — E_®jv 



7d. fc (l) 2 



k k 
q=l 9 =1 



Following the definitions of 7^.(1) and j d k (l) in (|2"4")l . and exploiting independence among particles 
under ^f k N ^ d y we have that: 



E ® JVd 



7 d , fc (l) 2 



^E[e3 ^i=i G *,j ] + (l - ^)E 2 [e3 £*=i G <= 



]g2 r e i Ej=i Gfc, 3 1 



*fe-i :t fc 



[ e 2 ^- 1 ^^ + ( l_^) e ^- 1 : tfc ] 



with the limit obtained from Lemma I A. 1 If iii) ■ Therefore: 



m* + l 



<W+i), d -> e"^ ; ; [ i e 2 ^—* + (1 - e ff *-i=«* ] . 



fe=i 



Thus, it suffices to show that the following difference goes to zero as d — >• 00: 



yl d := E [ Ai : ( m . +1 w] - <5i : ( m » 



:(m* + l), 
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Now, note that a simple identity gives that: 

m* +1 



A d 



E[A 1:(fe _ 1))d (E[A M |^_ i(d) ]-5 M )] •*(*+!); 



(m*+l),d 



k=l 



under the conventions that Ai:o,d = <^( m *+2):(m*+i) = !• Applying Cauchy-Schwarz yields the following 
upper-bound: 

E[A 1:(ib _ 1)id |E[A M |«^_ i(d) ]-4,d|] < 

EV» [ A? :(fc _ 1)id ] ■ E^ [ | E [ A M | ] - S k>d | 2 ] . 



Via Lemma I A. 31 the second of the terms in the bottom line vanishes in the limit, so it suffices to show 
that the first term in the bottom line is upper bounded uniformly in d. Using the Cauchy-Schwarz 
inequality, we have that: 

E[A? :(fe _ 1)id ]<nE^ [A ^]. 

q=l 

Recalling the definition of Ak.d = ^ d ' k ^2 from (|24|). using triangle inequality for norms we have: 

N 

E[^ q (ir]<(^j2 El/8 [ eii:UG "] 

i=i 

Now, we complete via Lemma [A. 61 □ 
Proof of Proposition [X71 To simplify the notation we drop i for the particle number and define: 

d 

Qi.d = ^2 Gl -o ' 



for 1 < I < k. Our proof proceeds by induction. For k = 1, the result follows by Lemma lA.lf iii). 
Assume that the result holds at time k — 1 > 1. Then we have the simple decomposition: 



E [ e £ti c ' g '-^ d ] = E |~ E [ e Cfc gk - d/d I } { e E '=' c ' g '- d/d - E [ e S£=i c < e ^/ d ] } 

+ E [ e^'=' Cl g '- d/d }E\e Ck Gk - d/d 1 . 



(25) 



We begin by dealing with the first term on the R.H.S. of (|2~5j) . By Lemma [A. 41 we have that: 

; c fc g fc /d J jj-TV 1 _ Iff „, \ c c k G k /d] 

whereas from Lemma lA.lf iii) we have: 

E»ti 



E [ e c » Q *' d I } - E^ d [ e c ^ d } 



Moreover, by the induction hypothesis: 



(26) 
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The expression in the expectation of the first term of (j2"5)l is uniformly integrable: indeed, careful and 
repeated (but otherwise straightforward) use of Holder and Jensen inequalities will eventually give 
that: 



E [ e Ck Gk -^ d | ^ t N k _ l{d) } { c ' G ^' d - E [ ^=7 c ' G ^/ d } } 



k-1 



< 



m 1 1 c 



,Ef=icJ 5t, d /d U 1/(1+5!) 



;=i 



for positive constants c\. k , Si-k, M independent of d. As a consequence, convergence in distribution 
implies also convergence of expectations: 



,c k Qk,d/d 1 r^N 



Ci Qi,d/d 



]} 



E[e fe *k-i :t k' { e^ i=1 Ci 1 - e 2 1=1 ' } ] = 

Now turning to the second term on the R.H.S. of (|2~5l) . we work as follows: 



E [ e c fc g fc , d /dj = E\E[e CkGk - d/d \.^ t N k _ i{d) ] -E,« 



®<j | e 
fc-i< d ) 



Cfc Qk,d/d- 



+ E ®d 



Cfe Q k ,d/d 1 



from Lemma IA.4I and (|26l) . We can thus deduce by the induction hypothesis that 



c,Z l ■ 



EfeEt/^C.d/d] E[e CkGk/d ] -> e 2 E,=1 Cj0 Vi :t < = TTE[e 

i=i 

which completes the proof. 

Lemma A. 3. Assume (JUKE) and g £ Bb{E). Then for any e > 0, N > 1 and 1 < k < m* + 1: 



□ 



3 



7 dU(i) 2 I i TP r Td.fcU-i i , n -v, it 



7? fc (l) 2 



+ c ■ 



Proof. Due to conditional independence among particles given /«, we have: 



(27) 



E[7^(l) 2 K_ l(d) ] = 

= £ ( E [ E el ^ Gij i K- lW ] + E E t E - ° li i K-m ] E [ * ^ u G - 1 ^ « ] 

1 z^m 

Now, for any constant c > 1 we have sup d E g»d [e 3 ^3 =1 Gk -o 1 < oo from Lemma [A.6[ so it suffices 
to prove that for any constant c > 1, as d — > oo: 



E [ e #E 3 d =1 GL 3 |^_ i(d) ] _E^ Nd [ e f T,U G U] _>() , in L 2(1+e 



(28) 



The factor of two in the norm arises as own has to use Cauchy-Schwarz to separate the product terms 
on the R.H.S. of (|27[). Now, Lemma rA.4l established the above convergence in probability; this together 
with uniform integrability implied by Lemma lA.61 establishes the result. □ 
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Lemma A. 4. Assume (J^T^j and that g G Bt(E). Then, we have that for any N > 1, i G {1, . . . , N}, 
k G {1, . . . , m* + 1} and c G M: 

Proof. By the conditional independence along j, we have: 



E [ G k,j I *JV I f ,; .. T e i 1 

L I tfc-i(d)J 11 Xi d (t k _ 1 (d)),j i 



3 = 1 



We will now omit various subscripts/superscripts to simplify the notation, using also E,r = E Wt ^ 



and Ey eEv- . We can rewrite: 



E [ Si=i I ^JV j _ E ^ Nd ^ e f E - = i g j ] = 

-(nM^Otnt '^y' + '}-']- (29 » 



From Lemma [ATTT iii) it follows that Ylj = i E,r [e cG ^ d ] — > e 2 c at *-^ %t * , hence we can now concentrate 
on the second factor-term on the R.H.S. of ([25]) . We will replace the product with a sum using 
logarithms. To that end define: 

{E # -E, }[e cG i /d ] 



,_ E 7r [e'= G ^ d ] 

Note that since g G Bj,(E), we have that Gj/fi is bounded from above and below, so there exist an 
e > and M > such that: 

- 1 + e <0j(d) < M < oo . (30) 

We need to prove that e^-J =1 lo s( 1+,3 j( d ^ — 1 ^> 0. We consider a second order Taylor expansion of the 
exponent: 

d d 

£ log(l + &(d)) = £ { /%(d) - | #(d) } (31) 

5=1 3=1 

where £j(d) G [0 A j8j(d), V &(d)]. By Lemma [Q1 we have that: 

d d 

3=1 3=1 

Since £j(d)'s are bounded due to (f30|) . these two results imply via the Taylor expansion in (|3Tj) that 
also 2j=i l°g(l + Pj(d)) — ?> 0. Due to the continuity of the exponential function, this implies now that 
e 5D j= i i°g( 1 +/ 3 j( d )) —i^o and the proof is now complete since weak convergence to a constant implies 
convergence in probability. □ 
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Lemma A. 5. Assume (^42H1P and g 6 Bt,(E). Then we have that for any N > 1, i € {1, ...,N}, 
k e {1, . . . ,m* + 1} and c e E: 

(«') E , ,J ► , ml! 



(«) E 

3=1 



3=1 ^tk-lW [e - 1 



, in Li 



Proof. To simplify the presentation, we drop many super/subscripts: that is, we write the quantity of 
interest as: 

E 7r [e cG */ d ] 

Note that E„ [e cG ^ d ] = E T [e cGl / d ]. Since |c/| is bounded, \G x /d\ is also bounded, so [e cG '/ d ] 
is lower and upper bounded by positive constants and can be ignored in the calculations. We will be 
using the second-order Taylor expansion: 



(32) 



where e [0 A -fi ,0 V -f^ ; 



Proof of (i): 

The Li-norm of the variable of interest is upper bounded by (recalling that E^ [Gj] = 0): 



E |E%o,[^] 

3=1 



IE* }[ 

3 = 1 



The first term in this bound goes to zero by Lemma lA.lf iv) . Thus considering the second term, we 
have the trivial inequality (for convenience we set a 2 = of _ ;f ): 



(33) 



E 


d 

E{%o.-- E -}[ 

3=1 


^(^) 


2 ] 


< 












E 


E%o.J(^ (d) - 


D(^) 2 ] 


+ 


E |E%- .J(^) 2 




3=1 






3=1 


Note that 









• |£i(d) < M (due to the boundedness assumption on g) ; 

• £i W - ^ in distribution (so also in L p for any p > 1 due to the above uniform bound) ; 

• E^ [ G 2 /d] -> a 2 , 

with the last two results following from Lemma [A..lf i.ii). These results, together, imply that the last 
term on the R.H.S. of (f3"3"]l goes to zero. For the first term on the R.H.S. of (|3"3"|) we work as follows. 
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Since for each j, \Gj/d\ is bounded we have that je^'W — 1 < M \£,j(d)\ < M As a result, using 
the triangular inequality and then this latter bound we have that: 

E | E [ (<* W - 1) ( ^ ) 2 ] | < f E E [ \<*i I 3 ] = f E E to I 3 - 

From Lemma fA.lf ii) we have that this latter term is upper-bounded by jpdd 3 / 2 — > 0. Now, for the 
second term on the R.H.S. of ([55)1 we work as follows. We have that: 

, 2 - 



Ev [G. 



[G)]= % {IJ . [ ( Gj E*„ J [Gj]) ]+ [ Gj ; 



Lemma EE^i) gives that | [Gj ] | < M, so we have that ^ £)* =1 E| q [Gj] -> in Lj. The 
result now follows from Lemma lA.lf iv). 

Proof of (ii): 

We will use again the Taylor expansion (|32p . Clearly, the Li-norm of the random variable of interest 
is bounded by: 



2E e [( e a-, j [^]) 2 ]+ 2 E e [({%„-^}[M^ 



3=1 3 = 1 

The first term goes to zero from the first result in Lemma lA.lf ii) and the second from the second 
result in Lemma [A.lf ii) applied here for p = 4. □ 

Lemma A. 6. Assume ('42HIP and g € Bb{E). Then we have that for any N > 1, i £ {1, . . . , N} and 
k € {1,. . . , m* + 1} and any fixed c € K: 

sup E [e% G ' k -i ] < oo . 

d 

Proof. To simplify the notation we rewrite the quantity of interest as 

d 

E[e*£*=i G i] = E [ II E x . 3 [ efGj ] ■ 

3 = 1 

Applying a second order Taylor expansion for e^ Gj yields that the above is equal to: 

E [ri( i + cE xo.j^]+4%„, J [(^ (d) ^) 2 ])" 

3 = 1 

with £j(d) £ [OA ^p">0 V ^ J -}- Using the fact that \Gj/d\ is upper bounded by a constant, from 
Lemma lA.lf ii) we have: 

|c%o,[%]|<M¥ ; 4% . 3 [(e^ d) ^) 2 ]<c 2 ^. 

Hence, we have that: 

E[ei^]<(l + M)<* 
with the latter upper bound converging by standard results in analysis. □ 
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A. 4 Propagation of Chaos 

Proof of Proposition \3.2[ For simplicity, consider the first q of N particles and j = 1. Then, for a 

1:<? _ f Vl vQ 
s(d),l — ^s(d),!' ' ■ • ' s(d), 



function F : E q — > [0, 1] we have, using the notation X g fL 1 = (X\ d \ 1; ■ ■ • , ^(d) i) : 



\E[F(X^ d)1 )]-nfHF) | < lEinX^)] -E^jFiXfyj] \ 

+ | E^ i(dj [FiX 1 ^)) - nf ( " d) (F) | 4- | nf {d) (F) - tt? '(F) | . (34) 

The last term on the R.H.S. goes to zero via the bounded convergence theorem (this follows directly 
from having assumed that g is upper bounded), so we consider the first two terms. For the first term 
on the R.H.S. of (|34|) one can use conditional expectations and write it as: 

\ i &n 1 _ n? r tpi y u i 



3 



where , d s is the filtration generated by the particle system up to (and including) the (n — l) th 

resampling time. The quantity inside the expectation can be equivalently written as: 

{<V*«LiM>>.i> ') ~ (^ l{d) K id) f q }(F) (35) 
where we set u(d) = (l d (t]~-i(d)) + 1) : ld(s(d)). For 1 < I < q we define the probability measures: 

Hi = fM(dy 1:{l . 1}l dy {l+1):q ) = (TT tk _ l(d) k u[d) ) ® k u(d) (X ld{tki{d))1 , • ) . 

Notice the simple identity (since intermediate terms in the sum below will cancel out): 

i(d)),l> ' ) ~ K-i(<l)^{<f) ) q }(dyi-. q ) (36) 

= ( k <d){X\ d(tk _ l{d))ill ■) - n tk _ l(d) k u ( d) )(dyi) ® M/(^l:(/-l)^y(i+l): 9 ) ■ 
1=1 

Since |F| < 1, we have \J Hi{dyi:(i~i)-,dy^ + i)- q )F{iji. q )\ < 1 for any yi. Given this property, using the 
identity (|36[) we have that the expression in (|35[) is bounded in absolute value by: 

V" [si firi \ j, \m J /^(^i=(i-i).^(t+i)= g )^(yi=g) 

f^h dKk Wh {snp yim \J ni{dy 1:{l _ 1) ,dy { i +1) . q )F{y 1:q )\ 



< J2 \\ k <d){X l ld{ t k _ l{d )),i) ~ nt k ^{d)K 



(d) \\tv 



1=1 



The above total variation bound converges to zero in Li as d — > oo by Lemma IA.2f i) , so also the first 
term on the R.H.S. of (|3"4"|) goes to zero as d — > oo. The second term on the R.H.S. of can be 
treated in a similar manner. One has again the identity: 

Emn [F(X 1 i«.)]-Tr® q ,JF) = 
q „ 

= J K { 7r t h _ 1 (d)ku(d) -7r s (d) }{dxi) x {7rf ( ^ _1) ® ( 7 r tfc _ l(d) fc ll(c;) )® ( ' z " ( J F 1 (x ; ))} 

< <? Il 7r t fc _ 1 (d) fc «(d) - 7T S (<i) || tt) • 

This last bound which will go to zero by Lemma [A. 21 4). Hence we conclude. □ 
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